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(54) Improved video coding methods and apparatuses 



(57) Video coding methods and apparatuses are 
provided that make use of various models and/or modes 
to significantly improve coding efficiency especially for 
high/compiex motion sequences. The methods and ap- 
paratuses take advantage of the temporal and/or spatial 



correlations that may exist within portions of the frames, 
e.g., at the Macroblock level, etc. The methods and ap- 
paratuses tend to significantly reduce the amount of da- 
ta required for encoding motion information while retain- 
ing or even improving video image quality. 
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Description 

RELATED PATENT APPLICATIONS 

5 [0001] This U.S. Non-provisional Application for Letters Patent claims the benefit of priority from, and hereby incor- 
porates by reference the entire disclosure of, co-pending U.S. Provisional Application for Letters Patent Serial No. 
60/376,005, filed April 26, 2002, and titled "Video Coding Methods and Arrangements". 

[0002] This U.S. Non-provisional Application for Letters Patent further claims the benefit of priority from, and hereby 
incorporates by reference the entire disclosure of, co-pending U.S. Provisional Application for Letters Patent Serial 
10 No. 60/352,1 27, filed January 25, 2002. 

TECHNICAL HELD 

[0003] This invention relates to video coding, and more particularly to methods and apparatuses for providing im- 
15 proved coding and/or prediction techniques associated with different types of video data. 

BACKGROUND 

[0004] The motivation for increased coding efficiency in video coding has led to the adoption in the Joint Video Team 
20 (JVT) (a standards body) of more refined and complicated models and modes describing motion information for a given 
macroblock. These models and modes tend to make better advantage of the temporal redundancies that may exist 
within a video sequence. See, for example, ITU-T, Video Coding Expert Group (VCEG), "JVT Coding - (ITU-T H.26L 
& ISO/IEC JTC1 Standard) - Working Draft Number 2 (WD-2)", ITU-T JVT-B118, Mar. 2002; and/or Heiko Schwarz 
and Thomas Wiegand, "Tree-structured macroblock partition", Doc. VCEG-N17, Dec. 2001 . 
25 [0005] The recent models include, for example, multi-frame indexing of the motion vectors, increased sub-pixel ac- 
curacy, multi-referencing, and tree structured macroblock and motion assignment, according to which different sub 
areas of a macroblock are assigned to different motion information . Unfortunately these models tend to also significantly 
increase the required percentage of bits for the encoding of motion information within sequence. Thus, in some cases 
the models tend to reduce the efficacy of such coding methods. 
30 [0006] Even though, in some cases, motion vectors are differentially encoded versus a spatial predictor, or even 
skipped in the case of zero motion while having no residue image to transmit, this does not appear to be sufficient for 
improved efficiency. 

[0007] It would, therefore, be advantageous to further reduce the bits required for the encoding of motion information, 
and thus of the entire sequence, while at the same time not significantly affecting quality. 
35 [0008] Another problem that is also introduced by the adoption of such models and modes is that of determining the 
best mode among all possible choices, for example, given a goal bitrate, encoding/quantization parameters, etc. Cur- 
rently, this problem can be partially solved by the use of cost measures/penalties depending on the mode and/or the 
quantization to be used, or even by employing Rate Distortion Optimization techniques with the goal of minimizing a 
Lagrangian function. 

40 [0009] Such problems and others become even more significant, however, in the case of Bidirectionally Predictive 
(B) frames where a macroblock may be predicted from both future and past frames. This essentially means that an 
even larger percentage of bits may be required for the encoding of motion vectors. 

[0010] Hence, there is a need for improved method and apparatuses for use in coding (e.g., encoding and/or de- 
coding) video data. 

45 

SUMMARY 

[0011] Video coding methods and apparatuses are provided that make use of various models and/or modes to sig- 
nificantly improve coding efficiency especially for high/complex motion sequences. The methods and apparatuses take 
so advantage of the temporal and/or spatial correlations that may exist within portions of theframes, e.g., at the Macroblock 
level, etc. The methods and apparatuses tend to significantly reduce the amount of data required for encoding motion 
information while retaining or even improving video image quality. 

[0012] Thus, by way of example, in accordance with certain implementations of the present invention, a method for 
use in encoding video data within a sequence of video frames is provided. The method includes encoding at least a 
ss portion of a reference frame to include motion information associated with the portion of the reference frame. The 
method further includes defining at least a portion of at least one predictable frame that includes video data predictively 
correlated to the portion of the reference frame based on the motion information, and encoding at least the portion of 
the predictable frame without including corresponding motion information, but including mode identifying data that 
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identifies that the portion of the predictable frame can be directly derived using the motion information associated with 
the portion of the reference frame. 

[0013] An apparatus for use in encoding video data for a sequence of video frames into a plurality of video frames 
including at least one predictable frame is also provided. Here, for example, the apparatus includes memory and logic, 

5 wherein the logic is configured to encode at least a portion of at least one reference frame to include motion information 
associated with the portion of the reference frame. The logic also determines at least a portion of at least one predictable 
frame that includes video data predictively correlated to the portion of the reference frame based on the motion infor- 
mation, and encodes at least the portion of the predictable frame such that mode identifying data is provided to specify 
that the portion of the predictable frame can be derived using the motion information associated with the portion of the 

10 reference frame. 

[0014] In accordance with still other exemplary implementations, a method is provided for use in decoding encoded 
video data that includes at least one predictable video frame. The method includes determining motion information 
associated with at least a portion of at least one reference frame and buffering the motion information. The method 
also includes determining mode identifying data that identifies that at least a portion of a predictable frame can be 
15 directly derived using at least the buffered motion information, and generating the portion of the predictable frame using 
the buffered motion information. 

[0015] An apparatus is also provided for decoding video data. The apparatus includes memory and logic, wherein 
the logic is configured to buffer in the memory motion information associated with at least a portion of at least one 
reference frame, ascertain mode identifying data that identifies that at least a portion of a predictable frame can be 
20 directly derived using at least the buffered motion Information, and generate the portion of the predictable frame using 
the buffered motion information. 



BRIEF DESCRIPTION OF THE DRAWINGS 



[0016] The present invention is illustrated by way of example and not limitation in the figures of the accompanying 
drawings. The same numbers are used throughout the figures to reference like components and/or features. 

Fig. 1 is a block diagram depicting an exemplary computing environment that is suitable for use with certain im- 
plementations of the present invention. 

Fig. 2 is a block diagram depicting an exemplary representative device that is suitable for use with certain imple- 
mentations of the present invention. 

Fig. 3 is an illustrative diagram depicting a Direct Motion Projection technique suitable for use in B Frame coding, 
in accordance with certain exemplary implementations of the present invention. 

Fig. 4 is an illustrative diagram depicting a Direct P and B coding techniques within a sequence of video frames, 
in accordance with certain exemplary implementations of the present invention. 

Fig. 5 is an illustrative diagram depicting Direct Motion Prediction for collocated macroblocks having identical 
motion information, in accordance with certain exemplary implementations of the present invention. 
Fig. 6 is an illustrative diagram depicting the usage of acceleration information in Direct Motion Projection, in 
accordance with certain exemplary implementations of the present invention. 

Fig. 7 is an illustrative diagram depicting a Direct Pixel Projection technique suitable for use in B Frame coding, 
in accordance with certain exemplary implementations of the present invention. 

Fig. 8 is an illustrative diagram depicting a Direct Pixel Projection technique suitable for use in P Frame coding, 
in accordance with certain exemplary implementations of the present invention. 
Fig. 9 is a block diagram depicting an exemplary conventional video encoder. 
Fig. 1 0 is a block diagram depicting an exemplary conventional video decoder. 

Fig. 11 is a block diagram depicting an exemplary improved video encoder using Direct Prediction, in accordance 
with certain exemplary implementations of the present invention. 

Fig. 1 2 is a block diagram depicting an exemplary improved video decoder using Direct Prediction, in accordance 
with certain exemplary implementations of the present invention. 

Fig. 13 is an illustrative diagram depicting a Direct Pixel/Block Projection technique, in accordance with certain 
exemplary implementations of the present invention. 

Fig. 14 is an illustrative diagram depicting a Direct Motion Projection technique suitable for use in B Frame coding, 
in accordance with certain exemplary implementations of the present invention. 

Fig. 15 is an illustrative diagram depicting motion vector predictions, in accordance with certain exemplary imple- 
mentations of the present invention. 

Fig. 16 is an illustrative diagram depicting interlace coding techniques for P frames, in accordance with certain 
exemplary implementations of the present invention. 

Fig. 17 is an illustrative diagram depicting interlace coding techniques for B frames, in accordance with certain 
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exemplary implementations of the present invention. 

Fig. 18 is an illustrative diagram depicting interlace coding techniques using frame and field based coding, in 
accordance with certain exemplary implementations of the present invention. 

Fig. 1 9 is an illustrative diagram depicting a scheme for coding joint field/frame images, in accordance with certain 
5 exemplary implementations of the present invention. 

DETAILED DESCRIPTION 

[0017] In accordance with certain aspects of the present invention, methods and apparatuses are provided for coding 
10 (e.g., encoding and/or decoding) video data. The methods and apparatuses can be configured to enhance the coding 
efficiency of "interlace" or progressive video coding streaming technologies. In certain implementations, for example, 
with regard to the current H .26 L standard, so called "P-frames" have been significantly enhanced by introducing several 
additional macroblock Modes. In some cases it may now be necessary to transmit up to 16 motion vectors per mac- 
roblock. Certain aspects of the present invention provide a way of encoding these motion vectors. For example, as 
15 described below, Direct P prediction techniques can be used to select the motion vectors of collocated pixels in the 
previous frame. 

[001 8] While these and other exemplary methods and apparatuses are described, it should be kept in mind that the 
techniques of the present invention are not limited to the examples described and shown in the accompanying drawings, 
but are also clearly adaptable to other similar existing and future video coding schemes, etc. 
20 [0019] Before introducing such exemplary methods and apparatuses, an introduction is provided in the following 
section for suitable exemplary operating environments, for example, in the form of a computing device and other types 
of devices/appliances. 

Exemplary Operational Environments: 

25 

[0020] Turning to the drawings, wherein like reference numerals refer to like elements, the invention is illustrated as 
being implemented in a suitable computing environment. Although not required, the invention will be described in the 
general context of computer-executable instructions, such as program modules, being executed by a personal com- 
puter. 

30 [0021] Generally, program modules include routines, programs, objects, components, data structures, etc. that per- 
form particu tar tasks o r implement particular abstract data types. Those skilled in the art will appreciate that the invention 
may be practiced with other computer system configurations, including hand-held devices, multi-processor systems, 
microprocessor based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, 
portable communication devices, and the like. 

35 [0022] The invention may also be practiced in distributed computing environments 

where tasks are performed by remote processing devices that are linked through a communications network. In a 
distributed computing environment, program modules maybe located in both local and remote memory storage devices. 
[0023] Fig.1 illustrates an example of a suitable computing environment 120 on which the subsequently described 
systems, apparatuses and methods may be implemented. Exemplary computing environment 120 is only one example 

40 of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality 
of the improved methods and systems described herein. Neither should computing environment 1 20 be interpreted as 
having any dependency or requirement relating to any one or combination of components illustrated in computing 
environment 120. 

[0024] The improved methods and systems herein are operational with numerous other general purpose or special 
45 purpose computing system environments or configurations. Examples of well known computing systems, environ- 
ments, and/or configurations that may be suitable include, but are not limited to, personal computers, server computers, 
thin clients, thick clients, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set 
top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed com- 
puting environments that include any of the above systems or devices, and the like. 
so [0025] As shown in Fig. 1 , computing environment 1 20 includes a general-purpose computing device in the form of 
a computer 130. The components of computer 130 may include one or more processors or processing units 132, a 
system memory 134, and a bus 136 that couples various system components including system memory 134 to proc- 
essor 132. 

[0026] Bus 1 36 represents one or more of any of several types of bus structures, including a memory bus or memory 
55 controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus 
architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) 
bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VE- 
SA) local bus, and Peripheral Component Interconnects (PCI) bus also known as Mezzanine bus. 
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[0027] Computer 130 typically includes a variety of computer readable media. Such media may be any available 
media that is accessible by computer 130, and it includes both volatile and non-volatile media, removable and non- 
removable media. 

[0028] In Fig. 1, system memory 134 includes computer readable media in the form of volatile memory, such as 
5 random access memory (RAM) 1 40, and/or non-volatile memory, such as read only memory (ROM) 1 38. A basic input/ 
output system (BIOS) 142, containing the basic routines that help to transfer information between elements within 
computer 130, such as during start-up, is stored in ROM 138. RAM 140 typically contains data and/or program modules 
that are immediately accessible to and/or presently being operated on by processor 132. 

[0029] Computer 130 may further include other removable/non-removable, volatile/non-volatile computer storage 

10 media. For example, Fig. 1 illustrates a hard disk drive 144 for reading from and writing to a non-removable, non- 
volatile magnetic media (not shown and typically called a "hard drive"), a magnetic disk drive 1 46 for reading from and 
writing to a removable, non-volatile magnetic disk 148 (e.g., a "floppy disk"), and an optical disk drive 150 for reading 
from or writing to a removable, non-volatile optical disk 152 such as a CD-ROM/R/RW, DVD-ROM/R/RW7+R/RAM or 
other optical media. Hard disk drive 1 44, magnetic disk drive 1 46 and optical disk drive 1 50 are each connected to bus 

15 1 36 by one or more interfaces 154. 

[0030] The drives and associated computer-readable media provide nonvolatile storage of computer readable in- 
structions, data structures, program modules, and other data for computer 130. Although the exemplary environment 
described herein employs a hard disk, a removable magnetic disk 148 and a removable optical disk 152, it should be 
appreciated by those skilled in the art that other types of computer readable media which can store data that is acces- 

20 sible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, random access memories 
(RAMs), read only memories (ROM), and the like, may also be used in the exemplary operating environment. 
[0031] A number of program modules may be stored on the hard disk, magnetic disk 148, optical disk 152, ROM 
138, or RAM 140, including, e.g., an operating system 158, one or more application programs 160, other program 
modules 162, and program data 164. 

25 [0032] The improved methods and systems described herein may be implemented within operating system 1 58, one 
or more application programs 160, other program modules 162, and/or program data 164. 

[0033] A user may provide commands and information into computer 130 through input devices such as keyboard 
1 66 and pointing device 1 68 (such as a "mouse"). Other input devices (not shown) may include a microphone, joystick, 
game pad, satellite dish, serial port, scanner, camera, etc. These and other input devices are connected to the process- 
30 jng unit 132 through a user input interface 170 that is coupled to bus 136, but may be connected by other interface 
and bus structures, such as a parallel port, game port, or a universal serial bus (USB). 

[0034] A monitor 1 72 or other type of display device is also connected to bus 136 via an interface, such as a video 
adapter 174. In addition to monitor 172, personal computers typically include other peripheral output devices (not 
shown), such as speakers and printers, which may be connected through output peripheral interface 175. 
35 [0035] Computer 130 may operate in a networked environment using logical connections to one or more remote 
computers, such as a remote computer 182. Remote computer 182 may include many or all of the elements and 
features described herein relative to computer 130. 

[0036] Logical connections shown in Fig. 1 are a local area network (LAN) 177 and a general wide area network 
(WAN) 1 79. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, 
*o and the Internet. 

[0037] When used in a LAN networking environment, computer 130 is connected to LAN 177 via network interface 
or adapter 186. When used in a WAN networking environment, the computer typically includes a modem 178 or other 
means for establishing communications over WAN 179. Modem 178, which may be internal or external, may be con- 
nected to system bus 136 via the user input interface 1 70 or other appropriate mechanism. 
45 [0038] Depicted in Fig. 1 , is a specific implementation of a WAN via the Internet. Here, computer 1 30 employs modem 
178 to establish communications with at least one remote computer 182 via the Internet 180. 

[0039] In a networked environment, program modules depicted relative to computer 130, or portions thereof, may 
be stored in a remote memory storage device. Thus, e.g., as depicted in Fig. 1 , remote application programs 1 89 may 
reside on a memory device of remote computer 182. It will be appreciated that the network connections shown and 

so described are exemplary and other means of establishing a communications link between the computers may be used. 
[0040] Attention is now drawn to Fig. 2, which is a block diagram depicting another exemplary device 200 that is also 
capable of benefiting from the methods and apparatuses disclosed herein. Device 200 is representative of any one or 
more devices or appliances that are operatively configured to process video and/or any related types of data in ac- 
cordance with all or part of the methods and apparatuses described herein and their equivalents. Thus, device 200 

ss may take the form of a computing device as in Fig.1 , or some other form, such as, for example, a wireless device, a 
portable communication device, a personal digital assistant, a video player, a television, a DVD player, a CD player, a 
karaoke machine, a kiosk, a digital video projector, a flat panel video display mechanism, a set-top box, a video game 
machine, etc. In this example, device 200 includes logic 202 configured to process video data, a video data source 
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204 configured to provide vide data to logic 202, and at least one display module 206 capable of displaying at least a 
portion of the video data for a user to view. Logic 202 is representative of hardware, firmware, software and/or any 
combination thereof. In certain implementations, for example, logic 202 includes a compressor/decompressor (codec), 
or the like. Video data source 204 is representative of any mechanism that can provide, communicate, output, and/or 

5 at least momentarily store video data suitable for processing by logic 202. Video reproduction source is illustratively 
shown as being within and/or without device 200. Display module 206 is representative of any mechanism that a user 
might view directly or indirectly and see the visual results of video data presented thereon. Additionally, in certain 
implementations, device 200 may also include some form or capability for reproducing or otherwise handling audio 
data associated with the video data. Thus, an audio reproduction module 208 is shown. 

w [0041] With the examples of Figs 1 and 2 in mind, and others like them, the next sections focus on certain exemplary 
methods and apparatuses that may be at least partially practiced using with such environments and with such devices. 

Direct Prediction for Predictive (P) and Bidirectional^ Predictive (B) frames in Video Coding: 

15 [0042] This section presents a new highly efficient Inter Macroblock type that can significantly improve coding effi- 
ciency especially for high/complex motion sequences. This Inter Macroblock new type takes advantage of the temporal 
and spatial correlations that may exist within frames at the macroblock level, and as a result can significantly reduce 
the bits required for encoding motion information while retaining or even improving quality. 

20 Direct Prediction 

[0043] The above mentioned problems and/or others are at least partially solved herein by the introduction of a 
"Direct Prediction Mode" wherein, instead of encoding the actual motion information, both forward and/or backward 
motion vectors are derived directly from the motion vectors used in the correlated macroblock of the subsequent ref- 
25 erence frame. 

[0044] This is illustrated, for example, in Fig. 3, which shows three video frames, namely a P frame 300, a B frame 
302 and P frame 304, corresponding to times t, t+1, and t+2 t respectively. Also illustrated in Fig. 3 are macroblocks 
within frames 300, 302 and 304 and exemplary motion vector (MV) information. Here, the frames have x and y coor- 
dinates associated with them. The motion vector information for B frame 302 is predicted (here, e.g., interpolated) from 
30 the motion vector information encoded for P frames 300 and 304. The exemplary technique is derived from the as- 
sumption that an object is moving with constant speed, and thus making it possible to predict its current position inside 
B frame 302 without having to transmit any motion vectors. While this technique may reduce the bitrate significantly 
for a given quality, it may not always be applied. 

[0045] Introduced herein, in accordance with certain implementations of the present invention, is a new Inter Mac- 

35 roblock type is provided that can effectively exploit spatial and temporal correlations that may exist at the macroblock 
level and in particular with regard to the motion vector information of the macroblock. According to this new mode It is 
possible that a current macroblock may have motion that can be directly derived from previously decoded information 
(e.g., Motion Projection). Thus, as illustratively shown in Fig; 4, there may not be a need to transmit any motion vectors 
for a macroblock, but even for an entire frame. Here, a sequence 400 of video frames is depicted with solid arrows 

40 indicating coded relationships between frames and dashed lines indicating predictable macroblock relationships. Video 
frame 402 is an I frame, video frames 404, 406, 410, and 412 are B frames, and video frames 408 and 414 are P 
frames. In this example, if P frame 408 has a motion field described by the motion of the collocated macroblocks 
in pictures 404, 406, and 414 is also highly correlated. In particular, assuming that speed is in general constant on the 
entire frame and that frames 404 and 406 are equally spaced in time between frames 402 and 408, and also considering 

45 that for B frames both forward and backward motion vectors could be used, the motion fields in frame 404 could be 
equaltoM?** = 1/3 x Sff ^ and = - 2/3 x mP^ for forward and backward motion fields respectively. Similarly, 
for frame 408the motion fields could be mP*™„ = 2/3 x mPaq G and Mp bw = -1/3x mPat* for forward and backward 
motion vectors respectively. Since 414 and 406 are equally spaced, then, using the same assumption, the collocated 
macroblock could have motion vectors MfP^s = M^406- 

50 [0046] Similar to the Direct Mode in B frames, by again assuming that speed is constant, motion for a macroblock 
can be directly derived from the correlated macroblock of the reference frame. This is further illustrated in Fig. 6, for 
example, which shows three video frames, namely a P frame 600, a B frame 602 and P frame 604, corresponding to 
times t, t+1, and t+2, respectively. Here, the illustrated collocated macroblocks have similar if not identical motion 
information. 

55 [0047] It is even possible to consider acceleration for refining such motion parameters, for example, see Fig. 7. Here, 
for example, three frames are shown, namely a current frame 704 at time t, and previous frames 702 (time t-1) and 
700 (time f-2), with different acceleration information illustrated by different length motion vectors. 
[0048] The process may also be significantly improved by, instead of considering motion projection at the macroblock 
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level taking into account that the pixels inside the previous image are possibly moving with a constant speed lor a 
coniiant acceleration (e.g., Pixei Projection). As such, one may generate a sign^mo re accu J-JJjJJ" * 
the current frame for B frame coding as illustrated, for example, in F.g 8, and for P ^^^^f^ 
example in Fig. 9. Fig. 8, for example, shows three video frames, namely a P frame 800, a B frame 802 and P frame 

s 804 corresponding to times t, t + 1 , and t + 2, respectively. Fig. 9, for example, shows three video frames, namely a P 
frame 900, a B frame 902 and P frame 904, corresponding to times f, t + 1, and t + 2, respectively. 
[0049] in certain implementations it is also possible to combine both methods together for even better P°«™™™* 
0050 In accordance with certain further implementations, motion can also be der^ed from -P"**™^'£ 
example, using prediction techniques employed for the coding of motion vectors from the mot.on "<<>™*™< "he 

10 surrounding macroblocks. AddKionally, performance can also be further enhanced by comb.n ng 

methods in a multi-hypothesis prediction architecture that does not require motion .nforrr.at.on to be transmrtted. Con- 
sZlX suS. ne„ Macroblock types can achieve significant bitrate reductions while achieving s.m.lar or .mproved 
quality. 

15 Exemplary Encoding Processes: 

[0051] Fig.10illustratesanexemplaryencodingenvironment1000,ha^ 

1002, wherein a video data 1004 is provided to encoder 1002 and a corresponding encoded vdeo data brtstream ,s 

roMSn Video data 1004 is provided to a summation module 1 006, which also receives as an input, the output irom 
i mot on compensation (MC) module 1 022. The output from summation module 1 006 Is prov.ded to 
transform (OCT) module 1010. The output of OCT modu.e 1010 is provided as an .nputto a quantize on mod ule (QP) 
1012 The output of QP module 1012 is provided as an input to an inverse quantizat.on module (QP 1)1014 and as a 
input to a variable length coding (VLC) module 1016. VLC module 1016 also recedes as ,n , .nputan output from a 
motion estimation (ME) module 1008. The output of VLC module 1016 is an encoded v.deo btetream 1210_ 
[0053] The output of QP-< module 1 01 4 is provided as in input to in inverse d.screte cosme transform (DOT) modu e 
018. The output of 1018 is provided as in input to a summation modu.e 1020, which has as "££7"^<£m 
from MC module 1022. The output from summation module 1020 is provided as an '"put to a loop filter module 102* 
The output from loop filter modu.e 1 024 is provided as an input to a frame buffer modu e 1026 One frame 
buffer module 1026 is provided as an input to ME module 1008, and another output .s prov.de I as an mput to MC 
module 1022. Me modu.e 1008 also receives as an input video data 1004. An output from ME 1008 .s proved as an 

[o P 054l° ^ C tWs Od exam 1 p 0 .?MC modu.e 1022 receives inputs from ME modu.e 1008. Here, ME is performed I on acurre nt 
frame aqainst a reference frame. ME can be performed using various block sizes and search ranges, after wh.ch a 
W-p^ter.usingsomepredefinedcri 

information is also coded after performing OCT and QP. It is also possible that in some cases *at the ^rfonmnce of 
ME does not produce a satisfactory result, and thus a macroblock, or even a subblock, could be INTRA encoded. 
[0055] Considering that motion information could be quite costly, the encod.ng process can be modrf.ed I as m Rg. 
2, in accordance witt certain exemp.ary implementations of the present invention, to a *° < on *^ 
the possibility that the motion vectors for a macroblock could be temporally and/or spatiaHy P^^ed f rom p ev.ously 
encoded motion information. Such decisions, for example, can be performed us.ng Rate D^tort.on O^ra^ tech- 
niques or other cost measures. Using such techniques/modes it may not be necessary to transmrt detaHec I motion 
information, because such may be replaced with a Direct Prediction (Direct P) Mode, eg as .llustrated in Fig 5_ 
[0056] Motioncanbemodeled,forexample,inanyofthefollowingmodelsortheircomb.nat.ons: > Moton Pmjection 
e g as illustrated in Fig. 3 for B frames and Fig. 6 for P frames); (2) Pixel Projection (e.g., as .llust ated ,n Rg. 8 for 
Bframes and Fig. 9 for P frames); (3) Spatial MV Prediction (e.g., median value of motion vectors of collocated mac- 
roblocks); (4) Weighted average of Motion Projection and Spatial Prediction; (5) or other like techniques. 

0057] Other prediction models (e.g. acceleration, filtering, etc.) may a.so be used. If on.y one of these umbels .s to 
be used, then this should be common in both the encoder and the decoder. Otherw.se, one may us " u ^^*^ 
will immediately guide the decoder as to which model it should use. Those skilled .n the art w.ll a so recogn^e that 
multi-referencing a block or macroblock is also possible using any combination of the above ™de^ 
[0058] In Fig.l2, an improved video encoding environment 1200 includes a v.deo encoder 1 202 that recedes v.deo 
data 1 004 and outputs a corresponding encoded video data bitstream. .^...h^ Q n 

[0059] Here, vkL encoder 1 202 has been modified to include improvement 1 204. Improvement 1204 Hndudesi an 
additional motion vector (MV) buffer modu.e 1 206 and a DIRECT decision module 1 208. More ^own 
MV buffer module 1206 is configured to receive as inputs, the output from frame buffe ^ h °^ 
from ME module 1008. The output from MV buffer module 1206 is prov.ded, along w.th the output from ME module 
1008 as an inp tit to D I RECT decision modu.e 1208. The output from DIRECT decision module 1208 is then prov.ded 
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as an input to MC module 1 022 along with the output from frame buffer module 1 026. g 
[0060] For the exemplary architecture to work successfully, the Motion Information from the previously coded frame 
is stored intact, which is the purpose for adding MV buffer module 1206. MV buffer module 1206 can be used to store 
motion vectors. In certain implementations. MV buffer module 1206 may also store Information about the reference 
5 frame used and of the Motion Mode used. In the case of acceleration, for example, additional buffering may be useful 
for storing motion information of the 2 nd or even N previous frames when, for example, a more complicated model for 
acceleration is employed. 

[0061] If a macroblock, subblock, or pixel is not associated with a Motion Vector (i.e., a macroblock is intra coded), 
then for such block it is assumed that the Motion Vector used is (0, 0) and that only the previous frame was used as 
10 reference. 

[0062] If multi-frame referencing is used, one may select to use the motion information as is, and/or to interpolate 
the motion information with reference to the previous coded frame. This is essentially up to the design, but also in 
practice it appears that, especially for the case of (0, 0) motion vectors, it is less likely that the current block is still 
being referenced from a much older frame. 
15 [0063] One may combine Direct Prediction with an additional set of Motion Information which is, unlike before, en- 
coded as part of the Direct Prediction. In such a case the prediction can, for example, be a multi-hypothesis prediction 
of both the Direct Prediction and the Motion Information. 

[0064] Since there are several possible Direct Prediction submodes that one may combine, such could also be com- 
bined within a mufti-hypothesis framework. For example, the prediction from motion projection could be combined with 
20 that of pixel projection and/or spatial MV prediction. 

[0065] Direct Prediction can also be used at the subblock level within a macroblock. This is already done for B frames 
inside the current H.26L codec, but is currently only using Motion Projection and not Pixel Projection or their combi- 
nations. 

[0066] For B frame coding, one may perform Direct Prediction from only one direction (forward or backward) and 
25 not always necessarily from both sides. One may also use Direct Prediction inside the Bidirectional mode of B frames, 
where one of the predictions is using Direct Prediction. 

[0067] In the case of Multi-hypothesis images, for example, It is possible that a P frame is referencing to a future 
frame. Here, proper scaling, and/or inversion of the motion information can be performed similar to B frame motion 
interpolation. 

30 [0068] Run-length coding, for example, can also be used according to which, if subsequent "equivalent" Direct P 
modes are used in coding a frame or slice, then these can be encoded using a run-length representation. 
[0069] DIRECT decision module 1208 essentially performs the decision whether the Direct Prediction mode should 
be used instead of the pre-existing Inter or Intra modes. By way of example, the decision may be based on joint Rate/ 
Distortion Optimization criteria, and/or also separate bitrate or distortion requirements or restrictions. 

35 [0070] It is also possible, in alternate implementations, that module Direct Prediction module 1 208 precedes the ME 
module 1008. In such case, if Direct Prediction can provide immediately with a good enough estimate, based on some 
predefined conditions, for the motion parameters, ME module 1008 could be completely by-passed, thus also consid- 
erably reducing the computation of the encoding. 

40 Exemplary Decoding Processes: 

[0071] Reference is now made to Fig. 11, which depicts an exemplary conventional decoding environment 1100 
having a video decoder 1 1 02 that receives an encoded video data bitstream 1 1 04 and outputs corresponding (decoded) 
video data 1120. 

45 [0072] Encoded video data bitstream 1 1 04 is provided as an input to a variable length decoding (VLD) module 11 06. 
The output of VLD module 1106 is provided as an input to a QP" 1 module 1108, and as an input to an MC module 1110. 
The output from QP" 1 module 1108 is provided as an input to an IDCT module 1112. The output of IDCT module 1112 
is provided as an input to a summation module 1114, which also receives as an input an output from MC module 1110. 
The output from summation module 1 1 1 4 is provided as an input to a loop filter module 1116. The output of loop filter 

so module 1116 is provided to a frame buffer module 1118. An output from frame buffer module 1118 is provided as an 
input to MC module 1110. Frame buffer module 1118 also outputs (decoded) video data 1120. 
[0073] An exemplary improved decoder 1302 for use in a Direct Prediction environment 1300 further includes an 
improvement 1306. Here, as shown in Fig. 13, improved decoder 1302 receives encoded video data bitstream 1210, 
for example, as output by improved video encoder 1202 of Fig. 12, and outputs corresponding video (decoded) video 

55 data 1304. 

[0074] Improvement 1306, in this example, is operatively inserted between MC module 1110 and a VLD module 
1106'. Improvement 1306 includes an MV buffer module 1308 that receives as an input, an output from VLD module 
1 1 06'. The output of M V buffer module 1 308 is provided as a selectable input to a selection module 1 31 2 of improvement 
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1306. A block mode module 1310 Is also provided in Improvement 1306'. Block mode module 1310 receives as an 
input, an output from VLD module 1 1 06'. An output of block mode module 1 31 0 is provided as an input to VLD module 
1106', and also as a controlling input to selection module 1312. An output from VLD module 1106' is provided as a 
selectable input to selection module 1312. Selection module 1312 is configured to selectably provide either an output 
from MV buffer module 1308 or VLD module 1106' as an input to MC module 1110. 

[0075] With improvement 1306, for example, motion information for each pixel can be stored, and if the mode of a 
macroblock is identified as the Direct Prediction mode, then the stored motion information, and the proper Projection 
or prediction method is selected and used. It should be noted that if Motion Projection is used only, then the changes 
in an existing decoder are very minor, and the .additional complexity that is added on the decoder could be considered 
negligible. 

[0076] If submodes are used, then improved decoder 1 302 can, for example, be configured to perform steps opposite 
to the prediction steps that improved encoder 1 202 performs, in order to properly decode the current macroblock. 
[0077] Again non referenced pixels (such as intra blocks) may be considered as having zero motion for the motion 
storage. 

Some Exemplary Schemes 



20 



[0078] Considering that there are several possible predictors that may be immediately used with Direct Prediction, 
for brevity purposes in this description a smaller subset of cases, which are not only rather efficient but also simple to 
implement, are described in greater detail. In particular, the following models are examined in greater demonstrative 
detail: 



25 



(A) In this example, Motion Projection is the only mode used. No run-length coding of Direct Modes is used, where 
as residue information is also transmitted. A special modification of the motion parameters is performed in the 
case that a zero motion vector is used. In such a situation, the reference frame for the Direct Prediction is always 
set to zero (e.g., previous encoded frame). Furthermore, intra coded blocks are considered as having zero motion 
and reference frame parameters. 



30 



35 



(B) This example is like example (A) except that no residue is transmitted. 

(C) This example is basically a combination of examples (A) and (B), in that if QP<n (e.g., n = 24) then the residue 
is also encoded, otherwise no residue is transmitted. 

(D) This example is an enhanced Direct Prediction scheme that combines three submodes, namely: 

(1) Motion Projection (M\? WP ); 

(2) Spatial MV Prediction (MV SP ); and 

(3) A weighted average of these two cases 



40 



[MVmp +2* MV &] 



45 



Wherein, residue is not transmitted for QP<n (e.g., n=24). Here, run-length coding is not used. The partitioning 
of the submodes can be set as follows: 



50 



Submodes 


Code 


Spatial Predictor 


0 


Motion Projection 


1 


Weighted Average 


2 



55 



The best submode could be selected using a Rate Distortion Optimization process (best compromise between 
bitrate and quality). 



(E) A combination of example (C) with Pixel Projection. Here, for example, an average of two predictions for the 
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Direct Prediction Mode. 

(F) This is a combination of example (C) with Motion_Copy R2 (see. e.g., Jani Lainema and Marta Karczewicz, 
"Skip mode motion compensation", Doc. JVT-C027, May 2002, which is incorporated herein by reference) or the 
5 like. This case can be seen as an alternative of the usage of the Spatial MV Predictor used in example (D), with 

one difference being that the spatial predictor, under certain conditions, completely replaces the zero skip mode, 
and that this example (F) can be run-length encoded thus being able to achieve more efficient performance. 

Motion Vector Prediction in Bidirectionalfy Predictive (B) frames with regards to Direct Mode : 
10 ~ ~ - ' " ' 

[0079] The current JVT standard appears to be quite unclear on how a Direct Mode coded macroblock or block 
should be considered in the motion vector prediction within Bidirectionally Predicted (B) frames. Instead, it appears 
that the current software considers a Direct Mode Macroblock or subblock as having a "different reference frame" and 
thus not used in the prediction. Unfortunately, considering that there might still be high correlation between the motion 
is vectors of a Direct predicted block with its neighbors such a condition could considerably hinder the performance of B 
frames and reduce their efficiency. This could also reduce the efficiency of error concealment algorithms when applied 
to B frames. 

[0080] In this section, exemplary alternative approaches are presented, which can improve the coding efficiency 
increase the correlation of motion vectors within B frames, for example. This is done by considering a Direct Mode 

20 coded block essentially equivalent to a Bidirectionally predicted block within the motion prediction phase. 

[0081] Direct Mode Macroblocks or blocks (for example, in the case of 8x8 sub-partitions) could considerably improve 
the efficacy of Bidirectionally Predicted (B) frames since they can effectively exploit temporal correlations of motion 
vector information of adjacent frames. The idea is essentially derived from temporal interpolation techniques where 
the assumption is made that if a block has moved from a position (x+dx,y+dfl at time t to a position (x,y) at time *+2, 

25 then, by using temporal interpolation, at time f+1 the same block must have essentially been at position: 

(x + f,y + f) 

30 [0082] This is illustrated, for example, in Fig. 1 4, which shows three frames, namely, a P frame 1 400, a B frame 1 402 
and P frame 1 404, corresponding to times f, f+ 1 , and t+2, respectively. The approach though most often used in current 
encoding standards instead assumes that the block at position (x, y) of frame at time t+1 most likely can be found at 
positions: 

(x + f ,y+ f ) at time f and 
(x-y,y-^)attimef+2. 

40 

[0083] The later is illustrated in Fig. 15, which shows three frames, namely, a P frame 1500, a B frame 1502 and P 
frame 1 504, corresponding to times t, t+1 , and t+2, respectively. Since the number of Direct Mode coded blocks within 
a sequence can be significant, whereas no residue and motion information are transmitted for such a case, efficiency 
of B frames can be considerably increased. Run-length coding (for example, if the Universal Variable Length Code 

45 (UVLC) entropy coding is used) may also be used to improve performance even further. 

[0084] Unfortunately, the current JVT standard does not clarify how the motion vector prediction of blocks adjacent 
to Direct Mode blocks should be performed. As it appears from the current software, Direct Mode blocks are currently 
considered as having a "different reference frame" thus no spatial correlation is exploited in such a case. This could 
considerably reduce the efficiency of the prediction, but could also potentially affect the performance of error conceal- 

50 ment algorithms applied on B frames in case such is needed. 

[0085] By way of example, if one would like to predict the motion vector of E in the current codec, if A, B, C, and D 
were all Direct Mode coded, then the predictor will be set as (0,0) which would not be a good decision. 
[0086] In Fig. 16, for example, E is predicted from A, B, C, and D. Thus, if A, B, C, or D are Direct Mode coded then 
their actual values are not currently used in the prediction. This can be modified, however. Thus, for example, if A, B, 

55 c , or D are Direct Mode coded, then actual values of Motion Vectors and reference frames can be used in the prediction. 
This provides two selectable options: (1) if collocated macrob lock/block in subsequent P frame is intra coded then a 
reference frame is set to -1 ; (2) if collocated macroblock/block in subsequent P frame is intra coded then assume 
reference frame is 0. 
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[0087] In accordance with certain aspects of the present invention, instead one may use the actual Motion information 
available from the Direct Mode coded blocks, for performing the motion vector prediction. This will enable a higher 
correlation of the motion vectors within a B frame sequence, and thus can lead to improved efficiency. 
[0088] One possible issue is how to appropriately handle Direct Mode Macroblocks for which, the collocated block/ 
5 macroblock in the subsequent frame was intra coded. Here, for example, two possible options include: 

(1) Consider this macroblock/block as having a different reference frame, thus do not use it in the motion vector 
prediction; and 

10 (2) Consider this macroblock as having (0, 0) motion vector and reference frame 0. 

[0089] In accordance with certain other exemplary implementations of the present invention, a further modification 
can be made in the de-blocking filter process. For the Direct Mode case, a de-blocking filter process can be configured 
to compare stored motion vector information that is taken from Direct Mode coded blocks - otherwise these would 
15 usually be considered as zero. In another modification, however, instead one may configure the de-blocking filter proc- 
ess to compare the (exact) motion vectors regardless of the block type that is used. Thus, in certain implementations, 
if for Direct Coded blocks no residue is transmitted, a "stronger" de-blocking filter can provide further improved per- 
formance. 

[0090] Furthermore, in certain other implementations, the Rate Distortion Decision for B frames can be redesigned 
20 since it is quite likely that for certain implementations of the motion vector prediction scheme, a different langrangian 
parameter X used in Rate Distortion Optimization decisions, may lead to further coding efficiency. Such A. can be taken, 
for example, as: 



25 



9e 

X = 0.85 X 2 3 



Inter Mode Decision Refinement: 

30 [0091] The JVT standard currently has an overwhelming performance advantage versus most other current Block 
Based coding standards. Part of this performance can be attributed in the possibility of using variable block sizes raging 
from 16x16 down to 4x4 (pixels), instead of having fixed block sizes. Doing so, for example, allows for a more effective 
exploitation of temporal correlation. Unfortunately, it has been found that, due to the Mode Decision techniques currently 
existing in conventional coding logic (e.g., hardware, firmware, and/or software), mode decisions might not be optimally 

35 performed, thus wasting bits that could be better allocated. 

[0092] In this section, further methods and apparatuses are provided that at least partly solve this problem and/or 
others. Here, the exemplary methods and apparatuses have been configured for use with at least 16x8 and 8x16 
(pixel) block modes. Furthermore, using a relatively simple solution where at least one additional criterion is introduced, 
a saving of between approximately 5% and 10% is provided in the complexity of the encoder. 

40 [0093] Two key features of the JVT standard are variable macroblock mode selection and Rate Distortion Optimiza- 
tion. A 1 6x1 6 (pixel) macroblock can be coded using different partitioning modes for which motion information is also 
transmitted. The selection of the mode to be used can be performed in the Rate Distortion Optimization phase of the 
encoding where a joint decision of best possible quality at best possible bitrate is attempted. Unfortunately, since the 
assignments of the best possible motion information for each subpartition is done in an entirely different process of 

45 the encoding, it is possible in some cases, that a non 16x16 mode (e.g. 16x8 or 8x16 (pixel)) carries motion infor- 
mation that is equivalent to a 16x16 macroblock. Since the motion predictors used for each mode could also be 
different, it is quite possible in many cases that such 16x16 type motion information could be different from the one 
assigned to the 1 6x 1 6 mode. Furthermore, under certain conditions, the Rate Distortion Optimization may in the end 
decide to use the non 16x16 macroblock type, even though it continues 16x16 motion information, without examining 

50 whether such could have been better if coded using a 16x16 mode. 

[0094] Recognizing this, an exemplary system can be configured to determine when such a case occurs, such that 
improved performance may be achieved. In accordance with certain exemplary implementations of the present inven- 
tion, two additional modes, e.g., referred to as P2to1 and P3to1 , are made available within the Mode decision process/ 
phase. The P2to1 and P3to1 modes are enabled when the motion information of a 16x8 and 8x16 subpartitioning, 

55 respectively, is equivalent to that of a 16x16 mode. 

[0095] In certain implementations all motion vectors and reference frame assigned to each partition may be equal. 
As such, the equivalent mode can be enabled and examined during a rate distortion process/phase. Since the residue 
and distortion information will not likely change compared to the subpartition case, they can be reused without signif- 
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icantly increasing computation. 

[0096] Considering though that the Rate Distortion Mode Decision is not perfect, it is possible that the addition and 
consideration of these two additional modes regardless of the current best mode may, in some limited cases, reduce 
the efficiency instead of improving it. As an alternative, one may enable these modes only when the corresponding 

5 subpartitioning mode was also the best possible one according to the Mode decision employed. Doing so may yield 
improvements (e.g., bitrate reduction) versus the other logic (e.g., codecs, etc.), while not affecting the PSNR. 
[0097] If the motion information of the 1 6x8 or 8x 1 6 subpartitioning is equivalent to that of the 16x16 mode, then 
performing mode decision for such a mode may be unnecessary. For example, if the motion vector predictor of the first 
subpartition is exactly the same as the motion vector predictor of the 16x16 mode performing mode decision is un- 

10 necessary. If such condition is satisfied, one may completely skip this mode during the Mode Decision process. Doing 
so can significantly reduce complexity since it would not be necessary, for this mode, to perform DCT, Quantization, 
and/or other like Rate Distortion processes/measurements, which tend to be rather costly during the encoding process. 
[0098] In certain other exemplary implementations, the entire process can be further extended to a Tree-structured 
macroblock partition as well. See, e.g., Heiko Schwarz and Thomas Wiegand, "Tree-structured macroblock partition", 

15 Doc. VCEG-N 1 7, Dec. 2001 . 

An Exemplary Algorithm 

[0099] Below are certain acts that can be performed to provide a mode refinement in an exemplary codec or other 
20 like logic (note that in certain other implementations, the order of the act may be changed and/or that certain acts may 
be performed together): 

Act 1 ; Set Valic{P2to1 ] = Va//ctP3to1 ] = 0. 

Act 2 : Perform Motion Vector and Reference frame decision for each possible Inter Mode. Let MV? 16x i6' 
25 MVP 16x16 , and refframe A6x16 be the motion vector, motion vector predictor, and reference frame of the 16X16 

mode, 



30 



45 



and 



& {refframe ^ xl , refframe *„} 

the corresponding information for the 16X8 mode, and 

40 s 



and 



befframe; M ,refframe*, lt } 

so for the 8x16 mode. ^ g 

Act 3 : If (fl4l£| 6x8 = M\r ^ 6xB )OR(jsSi^me 3 J= refframe ) and goto Act 7. 

Act 4 : If (M\^^\=J^^ B )OR(MVP^^fM^^ 6 )6^renmme^ 

then goto Act 6. * 

Act 5 : ValidLA 6x8] = 0; goto Act 7 
55 (e.g.., Disable 16x8 mode if identical to 16x16. Complexity reduction). 

Act 6: J&#o[P2to1]=1 ; (e.g., Enable refinement mode for 16x8) 

MV mo , = MV* 

i6x8' re ^ fan7e P2toi = refTira/ne _ , 
Act 7: If 16x8 
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(WV^ 8 ^ 6 !-MV^ 8x16 )Ofl(re/f/ame a \=refframe a ), 
then goto Act 11. 8x16 8x16 

Ar* a - |f ^ 

(^8xi6^l/ 16;d6 )OR(Ml^ 
5 then goto Actio. 8x16 

Act 9 : Validfix\ 6] = 0; goto Act 11 

(e.g., Disable 8x16 mode If identical to 16x16 to reduce complexity) 
Act 10: Va//d[P3to1]=1 
(e.g., enabbej^finement mode for 8x16) 
io MVpsw = M^Q^reffnamepzM = refframes a ; 

Act 11 : Perform Rate Distortion Optimizatfon for ail Inter & Intra modes if 
{Valid[MODEl= 1) 

where MODE E {!NTRA4x4, !NTRA16x16, SKIR1 6x1 6, 16x8,8x1 6, P8x8}, using the langrangian functional: 
J(s,c,MODBQPXmode) = SSD(s,c,MODBQP) + X MODB R(s,c,MODBQP) 
15 ActSet best mode to BestMode 

Act 12 : If 

(Besf/lfocte!=:16x8) then \/a//c(P3to1] = 0 (note that this act is optional). 
Act 13 If 

(Besf/Wode!==8x16) then Va//o(P2to1] = 0 (note that this act is optional). 
20 Act 14 : Perform Rate Distortion Optimization for the two additional modes if 

(Valid[MODE\ = 1 ) where MODE e {P2to1 ,F3fo1} 
(e.g., modes are considered equivalent to 16x16 modes). 
Act 15: Set BestMode to the overall best mode found. 

25 Applying Exemplary Direct Prediction Techniques For Interlace Coding : 

[0100] Due to the increased interest of interlaced video coding inside the H.26L standard, several proposals have 
been presented on enhancing the encoding performance of interlaced sequences. In this section techniques are pre- 
sented that can be implemented in the current syntax of H.26L, and/or other like systems. These exemplary techniques 
30 can provide performance enhancement. Furthermore, Direct P Prediction technology is introduced, similar to Direct B 
Prediction, which can be applied in both interlaced and progressive video coding. 
[0101] Further Information On Exemplary Direct P Prediction Techniques: 

[0102] Direct Mode of motion vectors inside B-frames can significantly benefit encoding performance since it can 
considerably reduce the bits required for motion vector encoding, especially considering that up to two motion vectors 

35 have to be transmitted. If, though, a block is coded using Direct Mode, no motion vectors are necessary where as 
instead these are calculated as temporal interpolations of the motion vectors of the collocated blocks in the first sub- 
sequent reference image. A similar approach for P frames appears to have never been considered since the structure 
of P frames and of their corresponding macroblock was much simpler, while each macroblock required only one motion 
vector. Adding such a mode would have instead, most likely, incurred a significant overhead, thus possibly negating 

40 any possible gain. 

[01 03] I n H .26L on the other hand, P frames were significantly enhanced by introducing several additional macroblock 
Modes. As described previously, in many cases it might even be necessary to transmit up to 16 motion vectors per 
macroblock. Considering this additional Mode Overhead that P frames in H.26L may contain, an implementation of 
Direct Prediction of the motion vectors could be viable. In such a way, all bits f orthe motion vectors and for the reference 

45 frame used can be saved at only the cost of the additional mode, for example, see Fig. 4. 

[0104] Even though a more straightforward method of Direct P prediction is to select the Motion vectors of the col- 
located pixels in the previous frame, in other implementations one may also consider Motion Acceleration as an alter- 
native solution. This comes from the fact that maybe motion is changing frame by frame, it is not constant, and by 
using acceleration better results could be obtained, for example, see Fig. 7. 

50 [0105] Such techniques can be further applied to progressive video coding. Still, considering the correlation that 
fields may have in some cases inside interlace sequences, such as for example regions with constant horizontal only 
movement, this approach can also help improve coding efficiency for interlace sequence coding. This is in particular 
beneficial for known field type frames, for example, if it is assumed that the motion of adjacent fields is the same. In 
this type of arrangement, same parity fields can be considered as new frames and are sequentially coded without 

ss taking consideration of the interlace feature. Such is entirely left on the decoder. By using this exemplary Direct P mode 
though, one can use one set of motion vectors for the first to be coded field macroblock (e.g., of size 16x16 pixels) 
where as the second field at the same location is reusing the same motion information. The only other information 
necessary to be sent is the coded residue image. In other implementations, it is possible to further improve upon these 
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techniques by considering correlations between the residue images of the two collocated field Blocks. 

[0106] In order to allow Direct Mode in P frames, it is basically necessary to add one additional Inter Mode into the 

system. Thus, instead of having only 8 Inter Modes, in one example, one can now use 9 which are shown below: 



5 


INTER MODES 


Description 




COPY_MB 0 


Skip macrobiock Mode 




M16x16_MB 1 


One 16 x 16 block 


10 


M16x8_MB2 


Two 16x8 blocks 




M8x16_MB3 


Two 8x16 blocks 




M8x8_MB 4 


Four 8x8 blocks 




M8x4_MB 5 


Eight 8x4 blocks 


15 


M4x8_MB 6 


Eight 4x8 blocks 


M4x4_MB 7 


Sixteen 16X8 blocks 




PDIRECT_MB 8 


Copy Mode and motion vectors of collocated macrobiock in previous frame 



[0107] In general, such exemplary Direct Modes for P frames can appear if the collocated macrobiock was also of 
2Q INTER type, except Skip macrobiock, but including Direct Mode, since in other cases there is no motion information 
that could be used. In the case of the previous macrobiock also being coded in Direct P Mode, the most recent Motion 
Vectors and Mode for this macrobiock are considered instead. To more efficiently though handle the cases that this 
Mode will not logically appear, and in particular if INTRA mode was used, one may select of allowing this Mode to also 
appear in such cases with the Mode now signifying a second Skip Macrobiock Mode where a copy the information is 
25 not from the previous frame, but from the one before it. In this case, no residue information is encoded. This is particularly 
useful for Interlace sequences, since it is more likely that a macrobiock can be found with higher accuracy from the 
same parity field frame, and not from the previously coded field frame as was presented in previous techniques. 
[0108] For further improved efficiency, if a set of two Field type frames is used when coding interlace images, the 
Skip Macrobiock Mode can be configured to use the same parity field images. If Direct P mode is used as a skipping 
30 flag, for example, then the different parity is used instead. An additional benefit of Direct P mode, is that it may allow 
for a significant complexity reduction in the encoder since it is possible to allow the system to perform a pre-check to 
whether the Direct P mode gives a satisfactory enough solution, and if so, no additional computation maybe necessary 
for the mode decision and motion estimation of that particular block. To also address the issue of motion vector coding, 
the motion vectors used for Direct P coding can be used "as is" for the calculation of a MEDIAN predictor. 

35 

Best Field First technique & Field Reshuffling: 

[0109] Coding of interlaced sequence allowing support of both interlace frame material, and separate interlace field 
images inside the same stream would likely provide a much better solution than coding using only one of the two 
40 methods. The separate interlace field technique has some additional benefits, such as, for example, de-blocking, and 
in particular can provide enhanced error resilience. If an error happens inside one field image, for example, the error 
can be easily consumed using the information from the second image. 

[0110] This is not the case for the frame based technique, where especially when considering the often large size 
of and bits used by such frames, errors inside such a frame can happen with much higher probability. Reduced corre- 

45 lation between pixels/blocks may not promote error recovery. 

[01 11] Here, one can further improve on the field/frame coding concept by allowing the encoder to select which field 
should be encoded first, while disregarding which field is to be displayed first. This could be handled automatically on 
a decoder where a larger buffer will be needed for storing a future field frame before displaying it. For example, even 
though the top field precedes the bottom field in terms of time, the coding efficiency might be higher if the bottom field 

50 is coded and transmitted first, followed by the top field frame. The decision may be made, for example, in the Rate 
Distortion Optimization process/phase, where one first examines what will the performance be if the Odd field is coded 
first followed by the Even field, and of the performance if the Even field is instead coded and is used as a reference 
for the Odd field. Such a method implies that both the encoder and the decoder should know which field should be 
displayed first, and any reshuffling done seamlessly. It is also important that even though the Odd field was coded first, 

55 both encoder and decoder are aware of this change when indexing the frame for the purpose of INTER/INTRA pre- 
diction. Illustrative examples of such a prediction scheme, using 4 reference frames, are depicted in Fig. 17 and Fig. 
18. In Fig. 17, interlace coding is shown using an exemplary Best Field First scheme in P frames. In Fig. 18, interlace 
coding is shown using a Best Field First scheme in B frames. 
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[0112] In the case of coding joint field/frame images, the scheme illustratively depicted in Fig. 19 may be employed. 
Here, an exemplary implementation of a Best Field First scheme with frame and field based coding is shown. If two 
frames are used for the frame based motion estimation, then at least five field frames can be used for motion estimation 
of the fields, especially if field swapping occurs. This allows referencing of at least two field frames of the same parity. 
5 in general 2XN+1 field frames should be stored if N full frames are to be used. Frames also could easily be interleaved 
and deinterleaved on the encoder and decoder for such processes. 

Conclusion 

10 [0113] Although the description above uses language that is specific to structural features and/or methodological 
acts, It is to be understood that the invention defined in the appended claims is not limited to the specific features or 
acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the invention. 



is Claims 

1. A method for use in encoding video data within a sequence of video frames, the method comprising; 

encoding at least a portion of at least one reference frame to include motion information associated with said 
20 portion of said reference frame; 

defining at least a portion of at least one predictable frame that includes video data predictively correlated to 
said portion of said reference frame based on said motion information; and 

encoding at least said portion of said predictable frame without including corresponding motion information 
and including mode identifying data that identifies that said portion of said predictable frame can be directly 
25 derived using at least said motion information associated with said portion of said reference frame. 

2. The method as recited in Claim 1 , wherein said mode identifying data defines a type of prediction model required 
to decode said encoded portion of said predictable frame. 

30 3. The method as recited in Claim 2, wherein said type of prediction model includes an enhanced Direct Prediction 

model that includes at least one submode selected from a group comprising a Motion Projection submode, a Spatial 
Motion Vector Prediction submode, and a weighted average submode. 

4. The method as recited in Claim 3, wherein said mode identifying data identifies said at least one submode. 

35 

5. The method as recited in Claim 1 , wherein said method generates a plurality of video frames comprising at least 
one predictable frame selected from a group of predictable frames comprising a P frame and a B frame. 

6. The method as recited in Claim 1 , wherein said portion of said reference frame includes data for at least one 
40 pixel within said reference frame, and said portion of said predictable frame includes data for at least one pixel 

within said predictable frame. 

7. The method as recited in Claim 6, wherein said portion of said reference frame includes data for at least a portion 
of a macroblock within said reference frame, and said portion of said predictable frame includes data for at least 

45 a portion of a macroblock within said predictable frame. 

8. The method as recited in Claim 1 , wherein said reference frame temporally precedes said predictable frame 
within said sequence of video frames. 

so 9. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 

frame includes velocity information. 

10. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 
frame includes acceleration information. 

55 

11. The method as recited in Claim 1 , wherein said portion of said reference frame and said portion of said pre- 
dictable frame are spatially correlated. 
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12. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 
frame includes Pixel Projection information required to decode said encoded portion of said predictable frame. 

13. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 
5 frame includes Spatial Motion Vector Prediction information required to decode said encoded portion of said pre- 
dictable frame. 

14. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 
frame includes combined Pixel Projection and Spatial Motion Vector Prediction information required to decode 

10 said encoded portion of said predictable frame. 

15. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 
frame includes multi-hypothesis prediction information required to decode said encoded portion of said predictable 
frame. 

15 

1 6. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 
frame is null and said mode identifying data identifies that said portion of said reference frame includes said portion 
of said predictable. 

20 1 7. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 

frame includes corresponding residue information. 

18. The method as recited in Claim 1 , wherein said motion information associated with said portion of said reference 
frame includes corresponding residue information only if a quantization parameter (QP) meets at least one defined 

25 condition. 

19. The method as recited in Claim 18, wherein said at least one defined condition includes a threshold value. 

20. The method as recited in Claim 18, wherein said a threshold value is about QP > twenty-three. 

30 

21 . The method as recited in Claim 1 , wherein said at least one predictable frame and reference frame are part 
of an interlaced sequence of video fields. 

22. The method as recited in Claim 21 , wherein motion information is associated with at least one collocated pixel 
35 in said reference frame. 

23. The method as recited in Claim 22, wherein encoding at least a portion of said at least one reference frame 
further includes encoding based on a correlation between residue images of two collocated field blocks. 

40 23. The method as recited in Claim 21 , further comprising for each of said reference frame and said predictable 

frame selecting an order in which fields within said interlaced sequence of video fields are to be encoded. 

24. The method as recited in Claim 21 , wherein said at least one predictable frame and reference frame each have 
at least two fields associated with them. 

45 

25. The method as recited in Claim 1 , further comprising: 

selectively determining if a direct prediction mode is used instead of a pre-existing mode during said encoding 
of said at least said portion of said predictable frame based on at least one factor. 

so 

26. A computer-readable medium having computer-implementable instructions for performing acts comprising: 

encoding video data for a sequence of video frames into at least one predictable frame selected from a group 
of predictable frames comprising a P frame and a B frame, by: 

55 

encoding at least a portion of at least one reference frame to include motion information associated with 
said portion of said reference frame; 

defining at least a portion of at least one predictable frame that includes video data predictively correlated 
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to said portion of said reference frame based on said motion information; and 

encoding at least said portion of said predictable frame without including corresponding motion information 
and including mode identifying data that identifies that said portion of said predictable frame can be directly 
derived using at least said motion information associated with said portion of said reference frame. 

27. The computer-readable medium as recited in Claim 26, wherein said mode identifying data defines a type of 
prediction model required to decode said encoded portion of said predictable frame. 

28. The computer-readable medium as recited in Claim 27, wherein said type of prediction model includes an 
enhanced Direct Prediction model that includes at least one submode selected from a group comprising a Motion 
Projection submode, a Spatial Motion Vector Prediction submode, and a weighted average submode, and wherein 
said mode identifying data identifies said at least one submode. 

29. The computer-readable medium as recited in Claim 26, wherein said portion of said reference frame includes 
data for at least one pixel within said reference frame, and said portion of said predictable frame includes data for 
at least one pixel within said predictable frame. 

30. The computer-readable medium as recited in Claim 26, wherein said motion information associated with said 
portion of said reference frame includes information selected from a group comprising velocity information and 
acceleration information. 



31. The computer-readable medium as recited in Claim 26, wherein said motion information associated with said 
portion of said reference frame includes information required to decode said encoded portion of said predictable 
frame that is selected from a group comprising: 

25 

Pixel Projection information; 

Spatial Motion Vector Prediction information; 

Weighted Pixel Projection and Spatial Motion Vector Prediction information; and 
Multi-hypothesis prediction information. 

30 

32. The computer-readable medium as recited in Claim 26, wherein said motion information associated with said 
portion of said reference frame is null and said mode identifying data identifies that said portion of said reference 
frame includes said portion of said predictable. 

35 33. The computer-readable medium as recited in Claim 26, wherein said motion information associated with said 

portion of said reference frame includes corresponding residue information. 

34. The computer-readable medium as recited in Claim 26, wherein said motion information associated with said 
portion of said reference frame includes corresponding residue information only if a quantization parameter (QP) 

40 meets at least one defined condition. 

35. The computer-readable medium as recited in Claim 34, wherein said at least one defined condition includes 
a threshold value. 

45 36. The computer-readable medium as recited in Claim 26, wherein said at least one predictable frame and ref- 

erence frame are part of an interlaced sequence of video fields. 

37. The computer-readable medium as recited in Claim 36, wherein motion information is associated with at least 
one collocated pixel in said reference frame. 

50 

38. The computer-readable medium as recited in Claim 37, wherein encoding at least a portion of said at least 
one reference frame further includes encoding based on a correlation between residue images of two collocated 
field blocks. 



39. The computer-readable medium as recited in Claim 36, further comprising for each of said reference frame 
and said predictable frame selecting an order in which fields within said interlaced sequence of video fields are to 
be encoded. 
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40. The computer-readable medium as recited in Claim 36, wherein said at least one predictable frame and ref- 
erence frame each have at least two fields associated with them. 

41. The computer-readable medium as recited in Claim 26, having computer-implementable instructions for per- 
s forming further acts comprising: 

selectively determining if a direct prediction mode is used instead of a pre-existing mode during said encoding 
of said at least said portion of said predictable frame based on at least one factor. 

io 42. An apparatus for use in encoding video data for a sequence of video frames into a plurality of video frames 

including at least one predictable frame selected from a group of predictable frames comprising a P frame and a 
B frame, said apparatus comprising: 

memory for storing motion information; and 

15 logic operatively coupled to said memory and configured to encode at least a portion of at least one reference 

frame to include motion information associated with said portion of said reference frame, determine at least 
a portion of at least one predictable frame that includes video data predictively correlated to said portion of 
said reference frame based on said motion information, and encode at least said portion of said predictable 
frame without including corresponding motion information and including mode identifying data that identifies 

20 that said portion of said predictable frame can be directly derived using at least said motion information asso- 

ciated with said portion of said reference frame. 

43. The apparatus as recited in Claim 42, wherein said mode identifying data defines a type of prediction model 
required to decode said encoded portion of said predictable frame. 

25 

44. The apparatus as recited in Claim 43, wherein said type of prediction model includes an enhanced Direct 
Prediction model that includes at least one submode selected from a group comprising a Motion Projection sub- 
mode, a Spatial Motion Vector Prediction submode, and a weighted average submode, and wherein said mode 
identifying data identifies said at least one submode. 

30 

45. The apparatus as recited in Claim 42, wherein said portion of said reference frame includes data for at least 
one pixel within said reference frame, and said portion of said predictable frame includes data for at least one pixel 
within said predictable frame. 

35 46. The apparatus as recited in Claim 42, wherein said motion information associated with said portion of said 

reference frame includes information selected from a group comprising velocity information and acceleration in- 
formation. 

47. The apparatus as recited in Claim 42, wherein said motion information associated with said portion of said 
40 reference frame includes Information required to decode said encoded portion of said predictable frame that is 

selected from a group comprising: 

Pixel Projection information; 
Spatial Motion Vector Prediction information; 
45 Weighted Pixel Projection and Spatial Motion Vector Prediction information; and 

Multi-hypothesis prediction information. 

48. The apparatus as recited in Claim 42, wherein said motion information associated with said portion of said 
reference frame is null and said mode identifying data identifies that said portion of said reference frame includes 

so said portion of said predictable. 

49. The apparatus as recited in Claim 42, wherein said motion information associated with said portion of said 
reference frame includes corresponding residue information. 

55 50. The apparatus as recited in Claim 42, wherein said motion information associated with said portion of said 

reference frame includes corresponding residue information only if a quantization parameter (QP) meets at least 
one defined condition. 
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51. The apparatus as recited in Claim 42, wherein said at least one predictable frame and reference frame are 
part of an interlaced sequence of video fields. 

52. The apparatus as recited in Claim 51 , wherein motion information is associated with at least one collocated 
s pixel in said reference frame. 

53. The apparatus as recited in Claim 52, wherein said logic encodes said at least a portion of said at least one 
reference frame based on a correlation between residue images of two collocated field blocks. 

10 54. The apparatus as recited in Claim 51 , wherein said logic is further configured to, for each of said reference 

frame and said predictable frame, select an order in which fields within said interlaced sequence of video fields 
are encoded. 

55. The apparatus as recited in Claim 51 , wherein said at least one predictable frame and reference frame each 
15 have at least two fields associated with them. 

56. The apparatus as recited in Claim 42, wherein said logic is further configured to selectively determine if a direct 
prediction mode is used instead of a pre-existing mode when encoding said at least said portion of said predictable 
frame based on at least one factor. 

20 

57. A method for use in decoding encoded video data that includes a plurality of video frames comprising at least 
one predictable frame selected from a group of predictable frames comprising a P frame and a B frame, the method 
comprising; 

25 determining motion information associated with at least a portion of at least one reference frame; 

buffering said motion information; 

determining mode identifying data that identifies that at least a portion of a predictable frame can be directly 
derived using at least said buffered motion information; and 

generating said portion of said predictable frame using said buffered motion information. 

30 

58. The method as recited in Claim 57, wherein said mode identifying data defines a type of prediction model 
required to decode said encoded portion of said predictable frame. 

59. The method as recited in Claim 58, wherein said type of prediction model includes an enhanced Direct Pre- 
ss diction model that includes at least one submode selected from a group comprising a Motion Projection submode, 

a Spatial Motion Vector Prediction submode, and a weighted average submode. 

60. The method as recited in Claim 59, wherein said mode identifying data identifies said at least one submode. 

40 61 . The method as recited in Claim 57, wherein said portion of said reference frame includes data for at least one 

pixel within said reference frame, and said portion of said predictable frame includes data for at least one pixel 
within said predictable frame. 

62. The method as recited in Claim 61, wherein said portion of said reference frame includes data for at least a 
45 portion of a macroblock within said reference frame, and said portion of said predictable frame includes data for 

at least a portion of a macroblock within said predictable frame. 

63. The method as recited in Claim 57, wherein said reference frame temporally precedes said predictable frame 
within said sequence of video frames. 

50 

64. The method as recited in Claim 57, wherein said motion information associated with said portion of said ref- 
erence frame includes information selected from a group comprising velocity information and acceleration infor- 
mation. 

55 65. The method as recited in Claim 57, wherein said portion of said reference frame and said portion of said 

predictable frame are spatially correlated. 

66. The method as recited in Claim 57, wherein said motion information associated with said portion of said ref- 
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10 



erence frame includes information selected from a group comprising Pixel Projection information, Spatial Motion 
Vector Prediction, combined Pixel Projection and Spatial Motion Vector Prediction information, and multi-hypoth- 
esis prediction information. 

67. The method as recited in Claim 57, wherein said motion information associated with said portion of said ref- 
erence frame is null and said mode identifying data identifies that said portion of said reference frame includes 
said portion of said predictable frame. 

68. A computer-readable medium having computer-implementable instructions for performing acts comprising: 

decoding encoded video data that includes a plurality of video frames comprising at least one predictable 
frame selected from a group of predictable frames comprising a P frame and a B frame, by: 

buffering motion information associated with at least a portion of at least one reference frame; 
15 determining mode identifying data that identifies that at least a portion of a predictable frame can be directly 

derived using at least said buffered motion information; and 

generating said portion of said predictable frame using said buffered motion information. 

69. The computer-readable medium as recited in Claim 68, wherein said mode identifying data defines a type of 
20 prediction model required to decode said encoded portion of said predictable frame. 

70. The computer-readable medium as recited in Claim 69, wherein said type of prediction model includes an 
enhanced Direct Prediction model that includes at least one submode selected from a group comprising a Motion 
Projection submode, a Spatial Motion Vector Prediction submode, and a weighted average submode. 

25 

71. The computer-readable medium as recited in Claim 70, wherein said mode identifying data identifies said at 
least one submode. 

72. The computer-readable medium as recited in Claim 68, wherein said portion of said reference frame includes 
30 data for at least one pixel within said reference frame, and said portion of said predictable frame includes data for 

at least one pixel within said predictable frame. 

73. The computer-readable medium as recited in Claim 72, wherein said portion of said reference frame includes 
data for at least a portion of a macroblock within said reference frame, and said portion of said predictable frame 

35 includes data for at least a portion of a macroblock within said predictable frame. 

74. The computer-readable medium as recited in Claim 68, wherein said reference frame temporally precedes 
said predictable frame within said sequence of video frames. 

40 75. The computer-readable medium as recited in Claim 68, wherein said motion information associated with said 

portion of said reference frame includes information selected from a group comprising velocity information and 
acceleration information. 

76. The computer-readable medium as recited in Claim 68, wherein said portion of said reference frame and said 
45 portion of said predictable frame are spatially correlated. 

77. The computer-readable medium as recited in Claim 68, wherein said motion information associated with said 
portion of said reference frame includes information selected from a group comprising Pixel Projection information, 
Spatial Motion Vector Prediction, combined Pixel Projection and Spatial Motion Vector Prediction information, and 

50 multi-hypothesis prediction information. 

78. The computer-readable medium as recited in Claim 68, wherein said motion information associated with said 
portion of said reference frame is null and said mode identifying data identifies that said portion of said reference 
frame includes said portion of said predictable frame. 



55 



79. An apparatus for use in decoding video data for a sequence of video frames into a plurality of video frames 
including at least one predictable frame selected from a group of predictable frames comprising a P frame and a 
B frame, said apparatus comprising: 
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memory for storing motion information; and 

logic operatively coupled to said memory and configured to buffer in said memory motion information associ- 
ated with at least a portion of at least one reference frame, ascertain mode identifying data that identifies that 
at least a portion of a predictable frame can be directly derived using at least said buffered motion information, 
5 and generate said portion of said predictable frame using said buffered motion information. 

80. The apparatus as recited in Claim 79, wherein said mode identifying data defines a type of prediction model 
required to decode said encoded portion of said predictable frame. 

10 81 . The apparatus as recited in Claim 80, wherein said type of prediction model includes an enhanced Direct 

Prediction model that includes at least one submode selected from a group comprising a Motion Projection sub- 
mode, a Spatial Motion Vector Prediction submode, and a weighted average submode. 

82. The apparatus as recited in Claim 81 , wherein said mode identifying data identifies said at least one submode. 

15 

83. The apparatus as recited in Claim 79, wherein said portion of said reference frame includes data for at least 
one pixel within said reference frame, and said portion of said predictable frame includes data for at least one pixel 
within said predictable frame. 

20 84. The apparatus as recited in Claim 83, wherein said portion of said reference frame includes data for at least 

a portion of a macroblock within said reference frame, and said portion of said predictable frame includes data for 
at least a portion of a macroblock within said predictable frame. 

85. The apparatus as recited in Claim 79, wherein said reference frame temporally precedes said predictable frame 
25 within said sequence of video frames. 

86. The apparatus as recited in Claim 79, wherein said motion information associated with said portion of said 
reference frame includes information selected from a group comprising velocity information and acceleration in- 
formation. 

30 

87. The apparatus as recited in Claim 79, wherein said portion of said reference frame and said portion of said 
predictable frame are spatially correlated. 

88. The apparatus as recited in Claim 79, wherein said motion information associated with said portion of said 
35 reference frame includes information selected from a group comprising Pixel Projection information, Spatial Motion 

Vector Prediction, combined Pixel Projection and Spatial Motion Vector Prediction information, and multi-hypoth- 
esis prediction information. 

89. The apparatus as recited in Claim 79, wherein said motion information associated with said portion of said 
^o reference frame is null and said mode identifying data identifies that said portion of said reference frame includes 

said portion of said predictable frame. 

90. The apparatus as recited in Claim 79, wherein said logic includes a codec. 

45 
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tuses tend to significantly reduce the amount of data re- 
quired for encoding motion information while retaining or 
even improving video image quality. 
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